Introduction: This analysis performs a gene-gene clustering procedure that will identify clusters of co-expressed genes across multiple sample groups. It first runs an ANOVA to find genes significantly changed across sample groups and uses these genes as seeds to initiate a number of gene clusters. These clusters will be further refined based on several user-specific paramters. Gene set enrichment analysis is then used to find pre-defined gene sets that are over-represented in each cluster.
Â
Comparison between cell lines from 9 different cancer tissues (NCI-60); GSE5949
Reinhold WC, Reimers MA, Lorenzi P, Ho J et al. Multifactorial regulation of E-cadherin expression: an integrative study. Mol Cancer Ther 2010 Jan;9(1):1-16. PMID: 20053763.
Comparison between cell lines from 9 different cancer tissue of origin types (Breast, Central Nervous System, Colon, Leukemia, Melanoma, Non-Small Cell Lung, Ovarian, Prostate, Renal) from NCI-60 panel
Cluster genes co-expressed acroo 9 tissues/organs As a demonstration, only a subset of genes in the original data with high between sample variance were used.
The input data matrix was normalized so each gene had mean of 0 and SD of 1.0
Summary statistics and ANOVA p value across all sample groups were calculated for each gene.
Differentially expressed genes (DEGs) were selected as seeds for generating gene clusters, using the following criteria:
A total of 1000 genes were selected. These genes would be used as seeds to generate gene clusters in the next step.
Gene clusters were identified from the DEG seeds with the following steps:
10 gene clusters of 525 genes were identified from 1000 seed DEGs.
The gene clusters identified from the DEG seeds were further refined with the following steps:
The reclustering converged after 7 cycles
A total of 533 genes were clustered after refinement.
More info:
| Cluster | Num_Gene | Mean_Breast | Mean_CNS | Mean_Colon | Mean_Blood | Mean_Skin | Mean_Lung | Mean_Ovary | Mean_Prostate | Mean_Kidney |
|---|---|---|---|---|---|---|---|---|---|---|
| Cluster_1 | 24 | 0.430 | 1.7071 | -0.6095 | -0.4794 | -0.3584 | -0.027 | -0.240 | -0.2200 | 0.0853 |
| Cluster_2 | 82 | -0.017 | -0.5071 | 1.6785 | -0.3899 | -0.4641 | -0.062 | 0.097 | 0.0770 | -0.2399 |
| Cluster_3 | 57 | -0.160 | -0.2906 | -0.1920 | 2.0391 | -0.1890 | -0.250 | -0.230 | -0.3100 | -0.2475 |
| Cluster_4 | 16 | -0.370 | -0.5177 | 1.1083 | 1.0814 | -0.3114 | -0.200 | -0.200 | -0.2300 | -0.3166 |
| Cluster_5 | 24 | -0.190 | -0.1019 | 0.4968 | 1.2466 | -0.7612 | -0.140 | 0.150 | -0.2000 | -0.1535 |
| Cluster_6 | 47 | -0.067 | 0.3230 | 0.0223 | -1.8258 | 0.1561 | 0.300 | 0.230 | 0.1500 | 0.3767 |
| Cluster_7 | 156 | -0.310 | -0.1729 | -0.2899 | -0.2985 | 1.4933 | -0.300 | -0.370 | -0.2900 | -0.3311 |
| Cluster_8 | 97 | 0.260 | 0.6147 | -0.7912 | -0.9946 | -0.3664 | 0.190 | 0.310 | 0.0076 | 0.7872 |
| Cluster_9 | 30 | -0.310 | -0.2642 | -0.3230 | -0.5247 | -0.3492 | -0.160 | 0.270 | -0.2400 | 1.5013 |
Find predefined gene sets enriched in gene cluster comparing to the background.
Â
Â
Â
Â
Check out the RoCA home page for more information.
To reproduce this report:
Find the data analysis template you want to use and an example of its pairing YAML file here and download the YAML example to your working directory
To generate a new report using your own input data and parameter, edit the following items in the YAML file:
Run the code below within R Console or RStudio, preferablly with a new R session:
if (!require(devtools)) { install.packages('devtools'); require(devtools); }
if (!require(RCurl)) { install.packages('RCurl'); require(RCurl); }
if (!require(RoCA)) { install_github('zhezhangsh/RoCAR'); require(RoCA); }
CreateReport(filename.yaml); # filename.yaml is the YAML file you just downloaded and edited for your analysis
If there is no complaint, go to the output folder and open the index.html file to view report.
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] splines stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] CHOPseq_0.0.0.9000 Agri_0.0.0.9000 edgeR_3.10.2
## [4] limma_3.26.9 NOISeq_2.16.0 GenomicRanges_1.22.4
## [7] GenomeInfoDb_1.6.3 IRanges_2.4.8 S4Vectors_0.8.11
## [10] Biobase_2.28.0 BiocGenerics_0.16.1 Matrix_1.2-2
## [13] vioplot_0.2 sm_2.2-5.4 rchive_0.0.0.9000
## [16] htmlwidgets_0.5 DT_0.1 GtUtility_0.0.0.9000
## [19] gplots_3.0.1 awsomics_0.0.0.9000 yaml_2.1.13
## [22] rmarkdown_0.9.6 knitr_1.12.3 RoCA_0.0.0.9000
## [25] RCurl_1.95-4.8 bitops_1.0-6 devtools_1.11.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.4 XVector_0.10.0 formatR_1.3
## [4] highr_0.5.1 zlibbioc_1.14.0 tools_3.2.2
## [7] digest_0.6.9 lattice_0.20-33 jsonlite_0.9.20
## [10] evaluate_0.9 memoise_1.0.0 RSQLite_1.0.0
## [13] DBI_0.3.1 withr_1.0.1 stringr_1.0.0
## [16] gtools_3.5.0 caTools_1.17.1 grid_3.2.2
## [19] AnnotationDbi_1.32.3 gdata_2.17.0 magrittr_1.5
## [22] htmltools_0.3.5 KernSmooth_2.23-15 stringi_1.0-1
END OF DOCUMENT